observation alone
MobILE: Model-Based Imitation Learning From Observation Alone
This paper studies Imitation Learning from Observations alone (ILFO) where the learner is presented with expert demonstrations that consist only of states visited by an expert (without access to actions taken by the expert). We present a provably efficient model-based framework MobILE to solve the ILFO problem. MobILE involves carefully trading off exploration against imitation - this is achieved by integrating the idea of optimism in the face of uncertainty into the distribution matching imitation learning (IL) framework. We provide a unified analysis for MobILE, and demonstrate that MobILE enjoys strong performance guarantees for classes of MDP dynamics that satisfy certain well studied notions of complexity. We also show that the ILFO problem is strictly harder than the standard IL problem by reducing ILFO to a multi-armed bandit problem indicating that exploration is necessary for solving ILFO efficiently. We complement these theoretical results with experimental simulations on benchmark OpenAI Gym tasks that indicate the efficacy of MobILE.
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.98)
MobILE: Model-Based Imitation Learning From Observation Alone
This paper studies Imitation Learning from Observations alone (ILFO) where the learner is presented with expert demonstrations that consist only of states visited by an expert (without access to actions taken by the expert). We present a provably efficient model-based framework MobILE to solve the ILFO problem. MobILE involves carefully trading off exploration against imitation - this is achieved by integrating the idea of optimism in the face of uncertainty into the distribution matching imitation learning (IL) framework. We provide a unified analysis for MobILE, and demonstrate that MobILE enjoys strong performance guarantees for classes of MDP dynamics that satisfy certain well studied notions of complexity. We also show that the ILFO problem is strictly harder than the standard IL problem by reducing ILFO to a multi-armed bandit problem indicating that exploration is necessary for solving ILFO efficiently.
- Information Technology > Data Science > Data Mining > Big Data (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Robots (0.90)
Provable Representation Learning for Imitation Learning via Bi-level Optimization
Arora, Sanjeev, Du, Simon S., Kakade, Sham, Luo, Yuping, Saunshi, Nikunj
A common strategy in modern learning systems is to learn a representation that is useful for many tasks, a.k.a. representation learning. We study this strategy in the imitation learning setting for Markov decision processes (MDPs) where multiple experts' trajectories are available. We formulate representation learning as a bi-level optimization problem where the "outer" optimization tries to learn the joint representation and the "inner" optimization encodes the imitation learning setup and tries to learn task-specific parameters. We instantiate this framework for the imitation learning settings of behavior cloning and observation-alone. Theoretically, we show using our framework that representation learning can provide sample complexity benefits for imitation learning in both settings. We also provide proof-of-concept experiments to verify our theory.
- North America > United States > Washington > King County > Seattle (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
A New AI Learns Through Observation Alone: What That Means for Drone Surveillance
A breakthrough will allow machines to learn by observing. This Turing Learning, as its inventors have named it, promises smarter drones that could detect militants engaging in behavior that could endanger troops, like planting roadside bombs. Still in its infancy, the new machine learning technique is named for British mathematician Alan Turing, whose famous test challenges artificial intelligences to fool a human into thinking he or she is conversing with another human. In Turing learning, a program dubbed the "classifier" tries to learn about a system designed to fool it. In certain ways, Turing Learning resembles many existing machine-learning systems.
- Information Technology > Security & Privacy (0.86)
- Government > Military > Air Force (0.53)